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Face recognition has been using in a variety of applications like preventing 
retail crime, unlocking phones, smart advertising, finding missing persons, 
and protecting law enforcement. However, the ability of face recognition 
techniques reduces substantially because of changes in pose, illumination, and 
expressions of the individual. In this paper, a novel face recognition approach 
based on a non-subsampled shearlet transform (NSST), histogram-based local 
feature descriptors, and a convolutional neural network (CNN) is proposed. 
Initially, the Viola-Jones algorithm is used for face detection and then the 
extracted face region is preprocessed by image resizing operation. Then, 
NSST decomposes the input image into a low and high-frequency component 
image. The local feature descriptors such as local phase quantization (LPQ), 
pyramid of histogram of oriented gradients (PHOG), and the proposed CNN 
are used for extracting features from the low-frequency component of the 
NSST decomposition. The extracted features are fused to generate the feature 
vector and classified using support vector machine (SVM). The efficiency of 


the suggested method is tested on face databases like Olivetti Research 
Laboratory (ORL), Yale, and Japanese female facial expression (JAFFE). The 
experimental outcomes reveal that the suggested face recognition method 
outperforms some of the state-of-the-art recognition approaches. 
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1. INTRODUCTION 

Face recognition has grabbed noticeable attention in several areas like surveillance, information 
security, and entertainment [1]-[3] due to its uniqueness, low-cost, and easy accessibility compared to other 
biometric approaches. Face recognition is a process of recognizing an individual from the available face 
database [4]. A general face recognition methodology comprises pre-processing, extracting features, and 
classification stages. The pre-processing step involves operations like image de-noising, scaling, image 
registration, face detection, and normalization. In the feature extraction phase, features are obtained for 
efficient image representation and visual description. Feature extraction plays a major role in computer vision 
applications like face recognition [5]-[7], texture analysis [8], [9], and sketch synthesis [10]-[12]. A precise 
image feature should be both robust and discriminative to distinct variations like noise and illumination 
changes. The last step of the face recognition system is the classification which incorporates robust classifiers 
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namely K-Nearest neighbors (KNN), support vector machine (SVM), and extreme learning machine (ELM). 
to recognize the input face. 

Recently, several face recognition methods have been developed with a good recognition rate under 
certain constraints [13]. However, in practice, the face recognition process is affected by some external 
factors like illumination, conclusion, and imaging equipment, which leads to a reduction in the efficiency of 
the recognition system. Therefore, face recognition is still a challenging task [14]. 


2. RELATED WORK 

From the last few decades, several methodologies were developed to identify the faces in an image. 
Among all these methods feature extraction plays a major role. Typically, feature extraction techniques are 
categorized into subspace learning [5]-[7] and local feature descriptors [8], [9], [15], [16] methods. Principle 
component analysis (PCA) and fisher linear discriminant analysis (FLDA) are conventional subspace 
learning approaches. Local linear embedding (LLE) [17], isometric feature mapping [18], Laplacian 
Eigenmap [19] are different manifold learning techniques to unwrap the intrinsic low-dimensional 
representation. Abusham et al. [20] demonstrated an approach to face recognition by integrating PCA and 
LLE. An and Ruan [21] propounded Enhanced fisher’s linear discriminant (EFLD) method and it 
outperforms the earlier algorithms. PCA reduces the dimension and eliminates correlation, however, it is not 
appropriate for classification [22]. Zhou et al. [23] introduced a face recognition method depending upon 
PCA image reconstruction and linear discriminant analysis. But the above-mentioned methods are 
computationally expensive since they deal with the Eigen decomposition and also require a lot of memory. 

Compared to subspace learning methods, local feature descriptors are more efficient and robust. 
Further, they can be classified into handcrafted and learning-based descriptors. Local binary patterns (LBP) 
and Gabor wavelets are two typical handcrafted features. Ahonen et al. [24] primarily used LBP in face 
recognition and they attained promising results due to its effectiveness and simplicity [25]. Owing to this 
idea, several LBP approaches have been evolved [26]-[28]. However, the handcrafted features are sensitive 
to illumination variations, and also lose some texture information under specific conditions. These problems 
are resolved by learning-based descriptors. Among them, local quantized patterns [29] and discriminant face 
descriptors [30] staging a good performance. 

Dai et al. [31] manifested a decorrelated 2D-feed-forward neural network ensemble with random 
weights for face recognition. Chen et al. [32] addressed the problem of multi-pose classification using 2D- 
gabor features and the Deep Belief Nets. Mugeet et al. [33] utilized LBP and directional wavelet transform 
for face recognition. Tai et al. [34] proposed the orthogonal procrustes problem (OPR) as a framework to 
recognize pose varying faces. Li et al. [35] introduced a new method to estimate the low-rank representation 
for image classification. Khan et al. [36] proposed a system that can recognize faces with varying 
illumination and expressions by employing particle swarm optimization (PSO). Lin et al. [37] propounded a 
new dictionary learning approach for face recognition. In recent years convolutional neural network (CNN) 
methods have grabbed substantial attentiveness in face recognition. The CNNs considerably enhances the 
model generation ability by establishing effective regularization strategies such as dropout [38]. The research 
group at Facebook developed a deep learning facial recognition system named DeepFace [39]. Sun et al. [40] 
proposed a CNN-based face representation named deep hidden IDentity feature (DeepID), whose features are 
learned by training a group of small CNNs. Features extracted from all the CNNs are concatenated to form a 
powerful feature. Yin and Liu [41] proposed multi-task learning for face recognition with the illumination, 
expression, and pose estimation as the side tasks. G6rgel and Simsek [42], deep stacked denoising and sparse 
autoencoders (DSDSA) were used for face recognition. In this paper, a new face recognition technology is 
introduced by utilizing non-subsampled shearlet transform (NSST), histogram-based local feature 
descriptors, and CNN. 

The remainder of the paper is planned as: we discuss the proposed face recognition method in 
section 3. Experimental outcomes are demonstrated in section 4. Section 5 consists of the conclusion of the 


paper. 


3. PROPOSED WORK 

The proposed approach consists of five major phases, detecting a face from the input image, 
preprocessing, NSST decomposition, extracting features, and classification. Face detection removes 
unwanted parts like hands, neck, and surroundings from the images, and gives the region of interest. Here 
Viola-Jones [43] algorithm is utilized for face detection. After the detection of the face region, the image 
resizing preprocessing operation is performed. Later, NSST is applied to the preprocessed image, and 
features are extracted by using LPQ, pyramid of histogram of oriented gradients (PHOG), and the proposed 
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CNN. The extracted features are fused to obtain a hybrid feature vector. Finally, SVM is employed as a 
classifier to recognize the face images. The whole process is shown in Figure 1. 
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Figure 1. Block diagram of the proposed face recognition method 


3.1. Non-subsampled shearlet transform 

Traditional multiscale methods like wavelets, curvelets, and contourlet transforms are unable to 
capture the anisotropic features in multidimensional data. These problems are overcome by shearlets since 
they can efficiently represent the data in multidimensional phenomena [44]. Let dimension n = 2, the 
discrete shearlet transform can be given as (1), 


{Wy qr(x) = Idet M|?/?  (S4MPx — 1): p,q €Z,r EZ7} (1) 


where w is a group of basis functions that satisfies eL?(R2), M indicates the anisotropy matrix, S is a shear 
matrix, p, q, r are scale, dimension, and shift parameters. Both M, and S are invertible matrices with size 2 X 
2 and |det S| = 1. For each k > 0 and s € R, the matrices M and S are given by (2), 


mo wse6 2 


the matrix M controls the scaling of shearlet and S controls the orientation of shearlet. For k = 9, s = 1, (2) 
becomes 


meC Ysa} 9 


the basic function ~© for shearlet transform, for an = : € R2, # 0 is given by (4), 
y 1 P2 1 g y 


pO (6) = P1(B1) po (B2/P, ) (4) 


here i is the Fourier transform of wp. p, € C®(R), pz € C®(R) are both wavelets. 

The NSST decomposition consists of multi-scale and multi-directional factorization steps. To 
achieve multiscale factorization, the non-subsampled laplacian pyramid (NSLP) is utilized and it consists of a 
dual-channel non-subsampled filter bank to ensure multi-scale property, which separates the input image into 
low and high-frequency components. Implementation of successive NSLP decomposition is done to 
decompose the low-frequency component repeatedly and hence singularities in images are found. Similarly, 
to realize, multi-directional factorization improved shearing filters are used. In our proposed approach, 
initially, we detect the face region and then resize it to 64x64 and then NSST is applied to it. Figure 2(a) 
shows the input face image, Figure 2(b) gives the detected face region, and the low-frequency sub-band 
component from the NSST is shown in Figure 2(c). 
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(c) 


Figure 2. (a) input face image, (b) detected face region, (c) NSST low-frequency sub-band component 


3.2. Local phase quantization 

LPQ is a well-known local texture feature descriptor and used to extract the textual details, which 
are robust to blurring [45]. Initially, LPQ performs short time Fourier transform (STFT) to obtain the phase 
details for every pixel of the source image and then encrypts the corresponding phase information. Finally, 
estimates the distribution of the encrypted details to get the LPQ features. The mathematical description of 
LPQ is described as: 

Let us assume that p(m,n) be an original image. Then the spatial invariant blurring of the image 
p(m,n) is obtained by a convolution operation (5): 


q(m,n) = p(m,n)@h(m, n) (5) 
where q(m,n) is a blurred image, h(m,n) is the point spread function (PSF) and ® represents the 


convolution [46]. 
The Fourier representation of (5) is given by (6), 


Q(u,v) = P(u,v). H(u,v) (6) 


where Q(u,v), P(u,v), and H(u,v) are the Fourier transforms of g(m,n), p(m,n), and h(m,n) 
respectively. After that, the phase information of the blurred image is attained by the following expression, 


ZQ(u,v) = ZP(u,v) + ZH(u, v) (7) 


where Z2Q(u,v), ZP(u,v), and ZH(u,v) are the phases of q(m,n), p(m,n), and h(m,n) respectively. 
When the PSF, h(m,7n) is centrally symmetric, its phase has only two values and is represented by (8), 


_ (0, if H(u,v) => 0 
4H(u,v) = : otherwise (8) 
thus, the phase invariance between Q(u, v) and P(u, v) is obtained as (9), 
ZQ(u,v) = ZP(u,v), for all H(u,v) = 0 (9) 


However, in LPQ the phase details are evaluated over the M X M neighborhood region of image q(m,n). To 
obtain these local spectra features estimate the STFT by (10), 


oe, 7 yy > q(m, n) e i2m(um+vn)/M 
MENm NENn (10) 
where N,,, and N,, indicate the neighborhood region. LPQ finds the phase detail at frequency points z; = 


(a,0), 22 = (0,a), 23 = (a,a), Z, = (a,—a) using STFT, where a is a small integer that obeys (9). The 
acquired results are arranged as (11) 


V = [Q(2,), Q(22), Q(Z3), Q(Z4)] (11) 
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W = [Re{V}, Im{V}] (12) 


where Re{V} represents the real part of V and Im{V} denotes the imaginary part of V. The textural details 
can be obtained by encrypting the elements in W as (13), 


8 
c= » k; 2i-1 
i=1 (13) 


where k; is the quantization of the i*” element in W, given by (14) 


=f 1, ifW,>0 
t 0, otherwise (14) 


Finally, the LPQ is obtained by detecting the distribution histogram of the encoded values c. In the 
proposed method, after applying NSST on the face detected image, the obtained low-frequency sub-band 
component is applied to LPQ to obtain the blur insensitive texture features. The detected face region from the 
input face image is given in Figure 3(a), and Figure 3(b) represents the NSST low-frequency sub-band 
component. Figure 3(c) and Figure 3(d) show the LPQ descriptor image and the corresponding histogram. 


| 


i 


Figure 3. (a) detected face region, (b) NSST low-frequency sub-band component, (c) LPQ descriptor image, 
(d) histogram of (c) 
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3.3. Pyramid of histogram of oriented gradients 

For effective face recognition, we require shape information along with texture details. To obtain 
such shape information we apply the PHOG descriptor which is built by utilizing the histogram of oriented 
gradients (HOG) features and pyramid representation of the images [47]. HOG descriptor is used to find the 
local shape of the objects in images and pyramid representation addresses spatial structure. The image is split 
into tiny regions (cells) and HOG features [48] are computed for every spatial region. The cells are split 
recurrently to maintain the local shape information completely. The extracted features from all the cells are 
integrated to form the final HOG features and they are concatenated with the pyramid structure to incorporate 
the details associated with the spatial design. Canny edge detection algorithm was utilized to identify the 
edges in the face image, and then the face image is split into cells by following the quad-tree concept. Let the 
M be the number of levels, and N be the number of bins for HOG features, then the dimension for PHOG 
descriptor is given by N « Y_, 4*. In this work, we choose 3(M = 0,1,2) number of levels and the number 
of bins as 8, then the resultant feature vector has a size 168. Figure 4(a) shows the detected face image and 
Figure 4(b) represents the NSST low-frequency sub-band component. The PHOG descriptor image and the 
final histogram of the PHOG for the corresponding input face image are given in Figure 4(c) and Figure 4(d) 
respectively. 
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Figure 4. (a) detected face region, (b) NSST low-frequency sub-band component, (c) PHOG descriptor image 
(d) histogram of (c) 


3.4. Proposed convolutional neural network 

Convolutional neural networks have attained noticeable progress in image classification and they 
have been utilized in face recognition applications because they can extract robust facial features. The CNNs 
are generally made up of three types of layers namely, convolutional, pooling, and fully connected layers. A 
convolutional layer includes many convolutional kernels that are utilized to generate different feature maps. 
After each convolutional layer, a pooling layer is utilized that decreases the dimension of the feature maps 
and thus reduces the computational complexity of the CNN model. A fully connected layer considers all the 
neurons in the previous layer and associates them with every neuron of the current layer. 

The architecture of the proposed CNN is shown in Figure 5. The proposed convolutional neural 
network contains three convolutional, three pooling, and two fully connected layers. The input to the 
proposed CNN is a 64x64x1 grayscale image. The first convolutional layer has six 5x5 filters and the 
convolution stride is set to one pixel. Thus, the output of the first convolutional layer contains six feature 
maps with size 60x60. Here, ReLU non-linear activation function is used in the convolutional layer. After 
each convolutional layer, max-pooling is accomplished over a 2x2 window, with stride two. Hence, the 
outcome of the maxpooling! is feature maps with a 30x30 dimension. In each convolutional layer, the stride 
is considered as one whereas for max-pooling layers it is taken as two. The depth of the second and third 
convolutional layers is eight and ten, with output feature map dimensions 26x26x8 and 9x9x10 respectively. 
Maxpooling2 and maxpooling3 layers generate an output of 13x13x8 and 5x5x10 respectively. The last two 
layers are fully connected layers with 200 and 120 hidden units. 


FC2 
Convl Pooling! Conv2 Pooling? Conv3 Pooling3 200 120 
Input 60xc6 Oxx6 30x30x6 = 26x26x8 3x13 «0 Sx9x10 «© SxSxl0 
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Figure 5. Architecture of the proposed CNN 


4. EXPERIMENTAL RESULTS AND DISCUSSION 

The capability of the suggested method, with different filters for the Laplacian pyramid 
decomposition [49], is tested using three face databases: 1) ORL [50], ii) Yale [51], and iii) Japanese female 
facial expression (JAFFE) [52]. In every class of the database, 70% of images were utilized for training and 
the rest of the images were used for testing. While training the proposed CNN, stochastic gradient descent 
has been used for optimization with a base learning rate of 0.0001, and the maximum number of epochs as 
20. Each experiment was done 10 times with the chosen datasets and the average recognition rate was given. 

The ORL database comprises 40 different subjects. Each subject contains ten different images with 
distinct lighting environments, facial expressions, and attributes. In total ORL database includes 400 images 
each with 1 12x92 image resolution. The Yale face database includes a total of 165 face images of 15 subjects 
with 11 images per class. These images are considered under different configurations such as normal, happy, 
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sad, sleepy, surprised, no-glasses, wink, and center light. The JAFFE database includes face images of 10 
Japanese females with seven facial expressions like the surprise, happiness, sadness, anger, fear, disgust, and 
neutral. It consists of 213 static grayscale images each with 256x256 image resolution. Figure 6(a)-(c) show 
the images belong to a single subject of ORL, Yale, and JAFFE databases respectively. 

The performance of LPQ, PHOG, CNN, LPQ+PHOG, LPQ+CNN, PHOG+CNN, and 
LPQ+PHOG+CNN (proposed) is evaluated on the chosen databases with two different classifiers. The 
experiments were performed by utilizing k-fold cross-validation with ‘k’ value as 5. From every input image, 
the face region is extracted using the Viola-Jones algorithm, and then it is resized to 64x64 resolution. The 
features are extracted with LPQ, PHOG, and proposed CNN. These features are concatenated and fed to 
SVM. Another traditional classifier KNN is used for the classification of the extracted features. 

To observe the effect of various NSLP filters [49], the recognition rate of the proposed approach for 
different filters is tabulated in Tables 1-4. The ‘kos’ filter produces a recognition rate of 97.61% with SVM, 
and 97.45% with the KNN classifier on the ORL database as given in Table 1. On the Yale database, the 
suggested method attains a recognition rate of 97.88% for SVM and 96.52% for KNN. Also, the recognition 
rate achieved is 98.75% with SVM and 97.48% with KNN on the JAFFE database. By using the ‘pyr’ filter 
face recognition rate achieved is 99.32% with SVM, 98.67% with KNN on the ORL database as tabulated in 
Table 2. On the Yale database, the face recognition rate is 98.72% with SVM and 97.85% with the KNN 
classifier. On the JAFFE database, the suggested method attains a recognition accuracy of 99.45% with SVM 
and 98.54% with KNN. The recognition rate of the proposed method for the ‘pyrexc’ and ’maxflat’ filters is 
tabulated in Tables 3 and 4 respectively. 


Figure 6. Images belong to a single subject of; (a) ORL, (b) Yale, and (c) JAFFE databases 


From Tables 1-4, it is observed that, among the chosen NSLP filters, ‘pyr’ has given a good 
recognition rate for the proposed technique. The performance metrics for the proposed method with different 
classifiers on the three databases are given in Table 5. The ROC curve of the proposed face recognition 
system for the ORL database is shown in Figure 7(a)-(c) represent the ROC curves for Yale and JAFFE 
databases respectively. The required time to train the ORL, Yale, and JAFFE database to the proposed CNN 
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is 3.24 minutes, 1.56 minutes, and 2.35 minutes respectively. The execution time required to recognize the 
probe image of ORL, Yale, and JAFFE databases is 2 seconds, 1.2 seconds, and 1.6 seconds respectively. 


Table 1. Recognition rate (%) for the proposed method with ‘kos’ filter 


Database ORL Yale JAFFE 
Classifier KNN SVM KNN SVM KNN SVM 
LPQ 94.32 95.78 93.42 94.64 93.71 94.17 
PHOG 92.72 93.52 92.33 93.76 92.41 92.32 
CNN 94.54 96.31 95.56 96.37 95.28 96.84 


LPQ+PHOG 95.46 95.92 94.37 95.27 93.36 94.58 
LPQ+CNN 97.16 97.43 96.28 97.63 97.38 98.49 
PHOG+CNN 96.46 96.87 96.15 97.26 96.27 97.92 
Proposed 97.45 97.61 96.52 97.88 97.48 98.75 


Table 2. Recognition rate (%) for the proposed method with ‘pyr’ filter 


Database ORL Yale JAFFE 
Classifier KNN SVM KNN SVM KNN SVM 
LPQ 95.36 96.78 93.61 94.24 93.32 93.85 
PHOG 93.53 94.52 92.43 93.66 92.42 92.25 
CNN 96.43 97.91 96.21 96.55 96.72 97.41 


LPQ+PHOG 96.28 96.39 94.86 95.29 94.78 94.84 
LPQ+CNN 98.22 98.57 97.48 98.35 98.25 98.72 
PHOG+CNN 97.24 98.82 96.47 97.38 98.14 98.26 
Proposed 98.67 99.32 97.85 98.72 98.54 99.45 


Table 3. Recognition rate (%) for the proposed method with ‘pyrexc’ filter 


Database ORL Yale JAFFE 
Classifier KNN SVM KNN SVM KNN SVM 
LPQ 94.36 95.88 93.51 94.78 93.81 93.71 
PHOG 93.42 94.53 92.33 93.56 92.43 92.25 
CNN 96.25 96.58 95.24 96.34 96.29 95.62 


LPQ+PHOG 95.38 96.51 94.38 95.54 94.47 94.48 
LPQ+ CNN 97.65 97.16 96.58 97.43 97.58 98.51 
PHOG+CNN 97.29 96.97 96.14 97.23 97.36 97.46 

Proposed 97.78 97.94 97.28 97.87 97.67 98.87 


Table 4. Recognition rate (%) for the proposed method with ‘maxflat’ filter 


Database ORL Yale JAFFE 
Classifier KNN SVM KNN SVM KNN SVM 
LPQ 94.66 95.31 93.71 94.93 93.65 93.21 
PHOG 93.52 94.82 92.63 93.65 92.47 92.14 
CNN 96.23 97.23 95.84 96.34 96.14 96.95 


LPQ+PHOG 95.45 96.81 94.75 95.96 94.25 94.28 
LPQ+CNN 96.67 97.42 96.82 97.31 97.34 97.64 
PHOG+CNN 96.45 97.24 96.26 96.93 97.19 97.28 
Proposed 97.13 97.83 97.24 97.92 98.45 98.95 


From the values of Tables 1-4, it is inferred that the proposed technique achieves a maximum face 
recognition rate of 99.32%, 98.72%, and 99.45% on ORL, Yale, and JAFFE databases respectively with ‘pyr’ 
filter. The comparison of the recognition rate for the proposed face recognition system on the ORL, Yale, and 
JAFFE databases with some of the existing methods shown in Table 6 appendix, to show its effectiveness. 


Table 5. Performance metrics for the proposed method with ‘pyr’ filter 
Database ORL Yale JAFFE 
Classifier KNN SVM KNN SVM KNN SVM 
Precision 0.9833 0.9916 0.9812 0.9833 0.9925 0.9937 

Recall 0.9809 0.9904 0.9793 0.9777 0.9912 0.9916 
Specificity 0.8678 0.9334 0.9632 0.9761 0.9873 0.9752 
Fl-Score _ 0.9821 0.9909 0.9758 0.9804 0.9895 0.9926 


Int J Artif Intell, Vol. 10, No. 4, December 2021: 1079 - 1090 


Int J Artif Intell 


True Positive Rate 


True Positive Rate 


True Positive Rate 


0. 


0.6 


4 


S 
20 
1 


4 
a 


0.1 


ISSN: 2252-8938 


04 0.5 0.6 
False Positive Rate 


(a) 


0.7 


O8 


711087 


——PHOG 
—LPQ 
CNN 
~~ LPQ+PHOG 
en PHOG+CNN 
~~ LPQ+CNN 
er Proposed 


0.9 


——PHOG 
—LPQ 
CNN 
| LPQ+PHOG 
~~ PHOG+CNN 
——LPQ+CNN 
~~ Proposed 


0.8 - 


).6 | 


0.4) 


0.1 


0.1 


0.3 


0.3 


0.4 0.5 0.6 
False Positive Rate 


(b) 


0.4 0.5 0.6 
False Positive Rate 
(c) 


0.7 


0.7 


0.8 


0.8 


0.9 | 


——PHOG 
——LPQ 
——CNN 
~~ LPQ+PHOG || 
en PHOG+CNN 
——=LPQ+CNN 
~~ Proposed 


0.9 1 


Figure 7. ROC curves for the proposed method on; (a) ORL, (b) Yale, and (c) JAFFE databases 


5. 


CONCLUSION 


A reliable and effective face recognition system using the NSST, the histogram of local feature 
descriptors, and CNN is proposed. The significant contribution of this work is presenting a novel method 
using histogram-based local feature descriptors, and CNN features on a transformed image for robust face 
recognition. NSST decomposes the input face image, into low and high-frequency sub-band components 
using the Non-Subsampled Laplacian Pyramid. Histograms of the local feature descriptors namely LPQ, 
PHOG, and the deep features from CNN are obtained from the low-frequency sub-band component and 
concatenated to form the feature space. In our proposed method compared to KNN classifier SVM produces 
better results on the chosen face databases. The experimental results reveal that the suggested method 
effectively recognizes the faces with different illuminations, poses, and expressions. Compared to some of 
the existing approaches, the proposed method achieves a better recognition rate. 
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APPENDIX 


Table 6. Comparison of recognition rate (%) of the proposed method with some of the existing methods 


Method ORL Yale JAFFE 
PCA [21] 89.50 
EFLDA [21] 93.00 
CLDA [21] 94.06 
PCA image reconstructiontLDA+SVM [23] 97.48 
GFDBN [32] 94.98 
DIWTLBP [33] 97.00 
DSDSA [42] 98.00 
Proposed 99.32 
OPR [34] 94.15 
PLR [35] 96.23 
Yin [41] 95.02 
RDCDL [38] 97.22 
DSDSA [42] 98.16 
Proposed 98.72 
FLLEPCA [20] 94.98 
Single 2D-NNRW [31] 97.00 
PSO [36] 98.80 
Proposed 99.45 
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